Ohio is the only state whose name doesn’t share any letters with the word “mackerel.” It’s strange, but it’s true.
But that isn’t the only pairing of a state and a word you can say that about — it’s not even the only fish! Kentucky has “goldfish” to itself, Montana has “jellyfish” and Delaware has “monkfish,” just to name a few.
What is the longest “mackerel?” That is, what is the longest word that doesn’t share any letters with exactly one state? (If multiple “mackerels” are tied for being the longest, can you find them all?)
Extra credit: Which state has the most “mackerels?” That is, which state has the most words for which it is the only state without any letters in common with those words?
In [65]:
from collections import Counter
from joblib import Parallel, delayed
import json
import pandas as pd
import requests
To get our word list, we will use a list provided by Peter Norvig. And for the 50 states, I found a list on Github.
In [5]:
def get_word_list():
r = requests.get('https://norvig.com/ngrams/word.list')
word_list = [w.strip() for w in r.text.split()]
return word_list
word_list = get_word_list()
len(word_list)
Out[5]:
In [11]:
def get_state_list():
r = requests.get('https://gist.githubusercontent.com/tvpmb/4734703/raw/b54d03154c339ed3047c66fefcece4727dfc931a/US%2520State%2520List')
state_dct_list = json.loads(r.text)
return [s.get('name') for s in state_dct_list]
state_list = get_state_list()
state_list[:10]
Out[11]:
First let's start off simple by creating a method that will take a state name and a word and return a set of the overlapping letters.
In [18]:
def get_shared_letters(a, b):
return set(a.lower()).intersection(set(b.lower()))
get_shared_letters('mackerel', 'Mississippi')
Out[18]:
Now we will make a method that takes a word and the list of states and returns a dictionary mapping the state name to the overlapping set.
In [21]:
def get_state_to_shared_letters(word, state_list):
return {state_name: get_shared_letters(word, state_name) for state_name in state_list}
get_state_to_shared_letters('mackerel', state_list)
Out[21]:
Now we can make a method that filters for the state names with empty sets only.
In [22]:
def filter_empty_sets(state_to_shared_letters_dct):
return {state_name: shared_set for state_name, shared_set in state_to_shared_letters_dct.items() if len(shared_set) == 0}
filter_empty_sets(get_state_to_shared_letters('mackerel', state_list))
Out[22]:
Finally, we can iterate through all the words in the dictionary and find the words that have only a single state with no shared letters.
In [56]:
def get_words_with_one_state_no_shared_letters(word_list, state_list):
use_joblib_parallel = True
if use_joblib_parallel:
def _get_filtered_empty_sets(word):
print(word)
empty_set_dict = filter_empty_sets(get_state_to_shared_letters(word, state_list))
if len(empty_set_dict) == 1:
return (word, list(empty_set_dict.keys())[0])
dct = Parallel(n_jobs=2)(delayed(_get_filtered_empty_sets)(word) for word in word_list)
dct = {item[0]: item[1] for item in dct if item is not None}
else:
dct = {}
for word in word_list:
empty_set_dict = filter_empty_sets(get_state_to_shared_letters(word, state_list))
if len(empty_set_dict) == 1:
dct[word] = list(empty_set_dict.keys())[0]
return dct
words_with_one_state_no_shared_letters = get_words_with_one_state_no_shared_letters(word_list, state_list)
len(words_with_one_state_no_shared_letters)
Out[56]:
Find the longest words.
In [63]:
for word in sorted(words_with_one_state_no_shared_letters.keys(), key=len, reverse=True)[:100]:
print(word, words_with_one_state_no_shared_letters[word])
What states have the most words associated with them?
In [68]:
Counter(words_with_one_state_no_shared_letters.values()).most_common()
Out[68]:
In [ ]: